{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 用线性方法处理分类问题——逻辑回归"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "实际上线性模型也可以用于分类任务。方法是把一个线性模型拟合成某个类型的概率分布，然后用一个函数建立阈值来确定结果属于哪一类。\n",
    "\n",
    "<!-- TEASER_END -->"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting ready"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里用的函数是经典的逻辑函数。一个非常简单的函数：\n",
    "\n",
    "$$f(x)= \\frac 1 {1+e^{-t}}$$\n",
    "\n",
    "它的图形如下图所示："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlEAAAFCCAYAAADR1oh2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYVNWd//H3V9x3BeMGuCRoNIlLEtEZxbQ7Q4wkMeO+\n5TeJxowxiRod15BMHMeJWzQaE9e4YtyNxl3RiCKgKC644IIgIoqgIChLn98fp1qatjea6r7VVe/X\n89ynqrpuVX2raLo/fb7nnhspJSRJkrR4liq6AEmSpO7IECVJktQBhihJkqQOMERJkiR1gCFKkiSp\nAwxRkiRJHWCIkipMRPwpIk7pwOP6RsTMiIjOqKszRcQ/IuLgTnjeKyPiv8v9vJUiIt6MiNkR8deC\nXv/K0utPLOL1paIZoqQlUPoltks5nzOldGRK6XftfO2dGz3urZTSKmkxF3+LiMMiYkEpgDVs53ek\n9na+3pCIuLrx11JKg1JKV7f0mCWQStti6yYBLAF7ppQObWmH0uc9r9G/7UcRsWF7njwi6iKivsn3\nxmdhN6V0GPBvS/gepG5r6aILkLq5Dv+SLtNrl2vUaXhKaccyPVel6XYjc2WWgOtTSod08PFvp5T6\ntHJ/rX++qmGOREmdICKWi4jzIuLt0nZuRCzb6P7jI2JyREyKiB+V/trfuHTfZyMgEdErIu6MiOkR\nMS0iHo3saqAv8PfS6MBxEbFh6XmWKj12zYi4ovT6H0TEra2V3Mx7OCwi/tnka03rvLBU30cRMaLh\nvtL9X4mI+0t1T4mIEyNiD+BEYN9S3WNK+w6LiP8oXY+IOKU00vZuRPw1IlYt3dfwHg+JiAkR8V5E\nnNTGP0eviLivVOOwiOjbqMYvN6rxpYj499LXDwcOAI4v1XlH6fO4o9FjX42IvzW6PTEitmjteUv3\nLRcRZ5XqnxK5fbt86b660vfEMaX3PjkiDmvj/bUlMOhIncIQJXWOk4H+wJalrT9wCkBEDAR+CewC\n9APqmjy28ejWscBEoBfwBeDElB0MvEVu5aySUjqrmRquBpYHNi899pxyvblG9gWGAGsA44HTASJi\nFeAB4B/AusCXgAdTSvcC/wMMLdW9del5Gr/nHwKHkj+XjYGVgT82ed3tgU3In+FpEfHlFuoL4EDg\nt+TP8Bng2lKNKwH3A9cAawH7ARdFxGYppb+U9juzVOdewCPAgNJj1wOWAbYr3d4YWCmlNLa15y3V\n9L+lz2PL0uX6wGmNal4bWBVYD/gP4MKIWK2F99ceCfhOKdA9HxE/WczHf6EU9l6PiHMiYsUlqEWq\nKoYoqXMcAPw2pfR+Sul94DdAw1ySfYDLU0rjUkpzgF+38jxzySFkw5TSgpTS8Pa8eESsCwwEfpJS\n+jClND+l9M9WHrJdabRremnUatt2vEwCbkkpjU4pLSCHjq1K9+0JTE4pnZtSmptSmpVSGtlQHq2P\njBwInJ1SejOl9DF55Gq/hhG2kt+klD5NKY0FniUHkpbcmVJ6LKU0lxxu/yUiepdqfCOl9NeUUn1K\n6RngFqBh1GiROlNKbwAzI2JrYEfgXmByRGwKfAt4tNF7b/Z5IyKAHwPHpJRmpJRmAWeQg1aDeeTv\nnQUppbuBWcCmrby/tvwN+DI5RP6YHDr3a/0hnxkHbJlSWgfYGfgGnRPGpW7JECV1jvWACY1uv1X6\nGuRQ1PhopknNPL7hl/fvySM890XEaxFxQjtfvw/wQUrpw3buPyKltEZpWzOl9GQ7H/duo+tzyKNG\nDa//ejufo6l1+fxntzR5hKbBlEbXZwMrtfBciUafbymUfUD+t9gA2LZReJxODr9rN3psU4+QR8gG\nlK4/Qg5QO5au08bz9gJWBJ5qdN/dpa83mJZSqm/y/lamHSLipFg4Afyi0nsel1KaUhrBfAL4A/CD\n9jxfSundlNJLpetvAscDe7fnsVItcGK51DkmAxuS/5KHPH/p7dL1d8gho0GLk3ZLIxXHAcdFxFeA\nhyJiZErpYVqf0D4RWDMiVluMINXUx+Rf+ABExDqL8di3yK2+5tS38PUGDZ9dg77AfHJg69vcA9rw\n2ecbESsDa5L/Ld4CHkkp7d7C41oKUXuV6jsdmAEcRG7rXVDap8XnLY2mzQE2Tym904H30qqU0v+Q\n26WdyT++pRL/M0hLbtmIWL7RtjRwPXBK5InhvchzXq4p7f834IelyccrAqc2eb7PWkgRsWdEfKnU\nBvoIWMDCEPIu8MXmCir9gr6bPBdn9YhYJiIW9+i7Z4GvRMSWpYnPQ1qqsxl3AetGxM9LE6lXiYj+\njeresPSemnM98MvSJPKVWTiHqrXw1dJzBTAoIraPPLH/v4EnUkpvl2rcJCIOKn0+y0TENo3mV71L\nnpPV2CPATsDyKaXJwGPktumawJjSPne29Lyl93AJcF5ErAUQEetHREtBbolFxOCIWCOy/sDRwO2N\n7h8WEc22lEsT3TcoPbYPcCZwW2fVKnU3hihpyf2D3HJp2E4DfgeMBsaWttGlr5FSugc4H3gYeAV4\novQ8n5YuG0+y/hJ5kvJM4HHgwpRSQ9voDHJQmx4RxzR6bIODyfNrXiIHgqNbqL/ZZRpSSq+QJ2Q/\nALwM/LPJfs09LpUeOxPYDfgOeeTtFRZOoL+xdDktIkY3U8/l5Enxj5JbgrOBnzV9jeZet4WvX0ue\ndzYN2Jo8ctRQ4+7k+Uhvl+o8A2g4ivIyYPPS53tL6TGvkv8t/lm6/RHwGnmJiIb3PquN5z2B3KId\nEREfkv99N2nHe2lJW0fe7Qu8Sg7hfwXOaLImV29yGGzO1sBw8rys4eSJ+S19H0k1J9paly8iLge+\nDUxNKX2thX3OJy+4Nhs4LKU0prn9JH1e6ait54Bl2xhtkRYRES+R55DdklL6YQce35s8yrdDB1//\nMvL8qndTSpu0tb9UbdoTogaQ/wq5qrkQFRGDgKNSSoNKR/T8IaW0XadUK1WJiPgeeQRrRfLowPyU\n0veLrUqStDjabOeVDoue3soue5F/CVA6omf1iFi7lf0lweHkFtt4csvtyGLLkSQtrnIcnbc+nz9c\nuzeLHvosqZGUkucbk6RurlwTy5tObCzqXGKSJEldohwjUW+z6Do3vVm4Hs5nIsJgJUmSuo2UUqtH\nv5YjRN0BHAUMjYjtgBkppWZbeW1NYld5DRkyhCFDhhRdRk3xM+96fuZdz8+86xXxmc+fDzNnwkcf\nfX6bOTNvs2bBxx8v3BrfnjULImDUqC4tu2xaXspuoTZDVERcTz6tQa+ImEheb2UZgJTSn1NK/4iI\nQRExnrzC8WIfZitJkspvwQKYPh3eey9vH3yQbzfdZsxY9PpHH8Gnn8Iqq8Cqqy68bNhWWSVvK60E\nq60G662Xr6+8cr5s2FZu1wmLuq82Q1RKaf927HNUecqRJEmt+eQTmDIF3nln4TZlCkydCu+/n8NS\nw+X06TnkrLVW3tZcE9ZYY+G26ab5cvXVF35t9dXzY1ZcMY8kqWWeO6+K1dXVFV1CzfEz73p+5l3P\nz7zzzJwJb70FEybk7a23YNIkePHFOm68MQemWbNgnXVg3XUXbuusA1ttlYNSr16Lhqal/U3fadpc\nbLNsLxSRnBMlSapln34Kr70Gr74K48cvGpYmTMijTBtsAH37Lrzs3XthWFpvvRyMlvKkbZ0uItqc\nWG6IkiSpjObPh9dfz0Gp6fbOOzkc9esHX/oSbLjhoqGpZ09baJXCECVJUidJCd5+G557btHtlVfy\nqFG/fgu3TTbJlxtsYHutuzBESZJUBvPm5YA0ahSMHbswMC23HHzta4tum2+eJ2WrezNESZK0mFKC\nN96AkSPz9uST8OyzufW2zTaw5ZYLA9MXvlB0teoshihJktowZw48/jgMH54D08iRsOyysO220L9/\nvvzGN/L6SKodhihJkpqYOzcHpYcegocfzi26LbaAAQNyYNp2W1h//aKrVNEMUZKkmrdgATz99MLQ\n9PjjeZL3zjvnbYcd8urbUmOGKElSTfroI7jnHrj9drj77ry+UkNo2nHHvNaS1BpDlCSpZkyaBHfc\nkbfHH4ftt4fBg+E737E9p8VniJIkVa2U4Pnn82jTbbflI+oGDcrBaY89bNFpyRiiJElVZ+JEuOYa\nuOqqfJqUwYPztsMOsMwyRVenatGeEOW6qZKkijdrFtxySw5OY8bAv/87XH45bLedp0lRcRyJkiRV\npPr6fDTdVVfllt2AAXDIIXmO0/LLF12dqp3tPElStzNpElx8cQ5PPXvCoYfC/vvD2msXXZlqie08\nSVK3MXIknHsu3HsvHHQQ3HlnXgRTqlSGKElSYebPh1tvzeFp8mQ4+ug8CrXaakVXJrXNECVJ6nIz\nZsCll8IFF0DfvnDssfkIu6X9raRuxG9XSVKXeeMNOOccuPbavKbTzTfDN79ZdFVSxyxVdAGSpOo3\neTL89Kc5MK28Mjz3XF7ryQCl7swQJUnqNO+/D8cdB1/9Kqy0Erz8MpxxhqdhUXUwREmSyu7DD+HX\nv4ZNN4XZs/PpWX7/e+jVq+jKpPIxREmSyubjj+HMM6FfP5gwAUaPhosugvXWK7oyqfwMUZKkJTZ/\nPlx4YQ5Po0fDI4/AlVfCRhsVXZnUeTw6T5K0RJ54Ao48Mq8uftddsPXWRVckdQ1DlCSpQ6ZNg//6\nrxyczj4b9tvPkwGrttjOkyQtlvp6uPxy2HxzWGEFGDcun9vOAKVa40iUJKndxo7N6z3Nmwd33w1f\n/3rRFUnFcSRKktSmmTPzqVl23RUOPhgef9wAJRmiJEmtuuuu3LqbNi2v93TEEdCjR9FVScWznSdJ\natbs2fCrX+UQdfXVUFdXdEVSZXEkSpL0Oc88k89rN316vm6Akj7PECVJ+kx9PZxzDuy+O5x8Mlx3\nHay+etFVSZXJdp4kCYDJk+HQQ3Mb78knXW1caosjUZIkbr01H203YEA+ZYsBSmqbI1GSVMM+/hh+\n+Ut48MEcpP7lX4quSOo+HImSpBr18svwjW/A3LkwZowBSlpchihJqkH33JNbd7/6FVx5Jay6atEV\nSd2P7TxJqiEp5aPvzj47t++2377oiqTuyxAlSTXik0/gJz+BZ5+FESOgb9+iK5K6N9t5klQD3nkH\ndtopL1/w2GMGKKkcDFGSVOVGj4b+/WHQILjhBlhppaIrkqqD7TxJqmLXXw9HHw1/+Qt873tFVyNV\nF0OUJFWh+no45ZQcoh58ELbYouiKpOpjiJKkKjN3Lhx8cJ4HNXIkrLVW0RVJ1ckQJUlVZPZs2Htv\nWG45uO8+WH75oiuSqlebE8sjYmBEvBQRr0bECc3cv1pE/D0inomI5yPisE6pVJLUqg8/hD32yCNP\nN91kgJI6W6shKiJ6AH8EBgKbA/tHxGZNdvtP4PmU0lZAHXB2RDjCJUldaOrUvITBVlvlFciX9qew\n1OnaGonqD4xPKb2ZUpoHDAUGN9mnHmg4YcCqwLSU0vzylilJasnEifkULnvuCeefD0u5eI3UJdr6\nr7Y+MLHR7UmlrzX2R2DziJgMPAv8vHzlSZJa88orOUAdcQT89rcQUXRFUu1oK0SldjzHQODplNJ6\nwFbAhRGxyhJXJklq1TPPQF0dnHoqHHNM0dVItaetrvnbQJ9Gt/uQR6MaOww4AyCl9FpEvAFsCoxu\n+mRDhgz57HpdXR11dXWLW68kCRg+HL7/fbjwQvjBD4quRur+hg0bxrBhwxbrMZFSy4NNpQniLwO7\nAJOBkcD+KaVxjfa5CHg3pfSbiFgbeArYIqX0QZPnSq29liSpfe67Dw48EK65Jh+NJ6n8IoKUUqsN\n8lZHolJK8yPiKOBeoAdwWUppXEQcUbr/z8B/A1dGxFgggOObBihJUnk88AAcdBDcdhtsv33R1Ui1\nrdWRqLK+kCNRkrREhg+H734XbrklTyaX1HnaMxLlgbCS1A08/XQ+gfA11xigpEphiJKkCvfii/Dt\nb8Of/+wcKKmSGKIkqYK99hrsvjv8/vd5JEpS5TBESVKFmjQJdt0VTjstTyaXVFkMUZJUgaZOzQHq\nqKPg8MOLrkZScwxRklRhpk/PLbz99oNjjy26GkktcYkDSaogM2fCbrvlNaDOOstz4UlFac8SB4Yo\nSaoQc+bAoEGwySZw8cUGKKlIhihJ6ibq62GffWCZZfJaUD16FF2RVNuW+LQvkqSuceKJeTL5/fcb\noKTuwhAlSQW75BK49VZ44glYbrmiq5HUXrbzJKlA998PBx8M//wn9OtXdDWSGtjOk6QK9sILcOCB\ncNNNBiipO3KdKEkqwLvvwp57wjnnwI47Fl2NpI4wRElSF5szBwYPhkMO8XQuUnfmnChJ6kL19bDv\nvrDssnkpA9eCkiqTc6IkqcKcdBJMmZInlBugpO7NECVJXeTSS+Hmm/NSBssvX3Q1kpaU7TxJ6gIP\nPggHHJCXMthkk6KrkdSW9rTznFguSZ3stddygLrhBgOUVE0MUZLUiWbPhu9/H049Ferqiq5GUjnZ\nzpOkTpJSXsYA4KqrnEgudScenSdJBbrwQhg7Nk8kN0BJ1ceRKEnqBMOHw/e+lwPUF79YdDWSFpcT\nyyWpAFOm5AU1r7jCACVVM0OUJJXRvHmwzz7wox/Bt79ddDWSOpPtPEkqo1/+El55Bf7+d1jKP1Ol\nbsuJ5ZLUha6/Hu64A0aNMkBJtcCRKEkqg+eeg513zufE22qroquRtKScWC5JXeDDD/OCmuecY4CS\naokjUZK0BOrr81IGffrAH/9YdDWSysU5UZLUyc46C6ZOhRtvLLoSSV3NkShJ6qAnn4S99soTyfv2\nLboaSeXknChJ6iQzZsD++8PFFxugpFrlSJQkLaaUYL/9oGdPuOiioquR1BmcEyVJneCyy2DcuNzO\nk1S7HImSpMXw4ovwrW/Bo4/CZpsVXY2kzuKcKEkqozlzchvvjDMMUJIciZKkdvvpT2HaNBg6FKLV\nv08ldXfOiZKkMrnlFrjnHhgzxgAlKXMkSpLaMGEC9O+fTy687bZFVyOpKzgnSpKW0Pz5cOCBcMwx\nBihJizJESVIrfvMbWHFF+NWviq5EUqVxTpQkteDhh/OaUE8/DUv5J6ekJvyxIEnN+OADOOQQuOIK\nWGedoquRVImcWC5JTTSc1mXddeG884quRlIRXOJAkjrg+uvhuefgyiuLrkRSJWuznRcRAyPipYh4\nNSJOaGGfuogYExHPR8SwslcpSV1k4kT4xS/gmmtghRWKrkZSJWu1nRcRPYCXgV2Bt4FRwP4ppXGN\n9lkdGA7skVKaFBG9UkrvN/NctvMkVbT6ethtN9hlFzjppKKrkVSkcqwT1R8Yn1J6M6U0DxgKDG6y\nzwHAzSmlSQDNBShJ6g7OPx8++QSOP77oSiR1B22FqPWBiY1uTyp9rbF+wJoR8XBEjI6Ig8tZoCR1\nhRdegNNPh6uugqWdLSqpHdr6UdGe/tsywNeBXYAVgSciYkRK6dWmOw4ZMuSz63V1ddTV1bW7UEnq\nLHPnwkEHwRlnwBe/WHQ1koowbNgwhg0btliPaWtO1HbAkJTSwNLtE4H6lNKZjfY5AVghpTSkdPtS\n4J6U0k1Nnss5UZIq0kknwfPPw+23e3JhSVk55kSNBvpFxIYRsSywL3BHk31uB3aIiB4RsSKwLfBi\nR4uWpK702GN5Qc1LLjFASVo8rbbzUkrzI+Io4F6gB3BZSmlcRBxRuv/PKaWXIuIeYCxQD1ySUjJE\nSap4M2fmVckvvhjWXrvoaiR1N65YLqlm/ehH+fLSS4utQ1LlccVySWrB7bfnEww/80zRlUjqrhyJ\nklRzpk6FLbeEm26C7bcvuhpJlag9I1GGKEk1JSXYe2/YZBP43/8tuhpJlcp2niQ1cd118Oqr+STD\nkrQkHImSVDMmT4att4a774avf73oaiRVsnKsEyVJVSElOPxw+MlPDFCSysN2nqSacOWV8PbbcMst\nRVciqVrYzpNU9SZOzKNPDz4IW2xRdDWSugPbeZJqXkp5Uc1f/MIAJam8DFGSqtoll8AHH8AJJxRd\niaRqYztPUtV6803YZht45BHYfPOiq5HUndjOk1Sz6uvhhz+E4483QEnqHIYoSVXpoovg00/hmGOK\nrkRStbKdJ6nqjB8P220Hjz+eT+8iSYvLdp6kmrNgARx2GJxyigFKUucyREmqKn/4A/ToAUcfXXQl\nkqqd7TxJVePll2H77eHJJ+GLXyy6Gkndme08STVjwYJ8NN6QIQYoSV3DECWpKpx3Hiy3HPz0p0VX\nIqlW2M6T1O29/DLssENu4228cdHVSKoGtvMkVb3GbTwDlKSuZIiS1K2de25u4x15ZNGVSKo1tvMk\ndVsvvZTbeCNHOgolqbxs50mqWg1tvN/8xgAlqRiGKEnd0jnnwPLL28aTVBzbeZK6Hdt4kjqb7TxJ\nVcc2nqRKYYiS1K3YxpNUKWznSeo2xo2DAQNg1CjYaKOiq5FUzWznSaoaDW283/7WACWpMhiiJHUL\nZ58NK6wAP/lJ0ZVIUmY7T1LFe+EF+Na3YPRo2HDDoquRVAts50nq9ubNg0MPhdNPN0BJqiyGKEkV\n7cwzoWdPOPzwoiuRpEXZzpNUsZ59FnbdFZ5+Gvr0KboaSbXEdp6kbmvu3NzG+7//M0BJqkyGKEkV\n6Xe/g9694bDDiq5Ekpq3dNEFSFJTTz0FF18MzzwD0epguiQVx5EoSRXl009zG+/cc2G99YquRpJa\n5sRySRXlxBPh5Zfh5psdhZJUnPZMLLedJ6lijBgBV1yRj8ozQEmqdLbzJFWEOXPyJPILLoC11y66\nGklqm+08SRXh2GNh0iS44YaiK5Ek23mSuonHHoPrr4exY4uuRJLaz3aepELNmpXbeBddBL16FV2N\nJLWf7TxJhTriiHyS4csvL7oSSVrIdp6kinbnnXDfffloPEnqbtps50XEwIh4KSJejYgTWtlvm4iY\nHxHfL2+JkqrRe+/B4YfDVVfBqqsWXY0kLb5WQ1RE9AD+CAwENgf2j4jNWtjvTOAewNVdJLUqpRyg\nDjkEBgwouhpJ6pi22nn9gfEppTcBImIoMBgY12S/nwE3AduUu0BJ1efKK+GNN2Do0KIrkaSOaytE\nrQ9MbHR7ErBt4x0iYn1ysNqZHKKcPS6pRW+8AccfDw8/DMstV3Q1ktRxbYWo9gSi84D/SimliAha\naecNGTLks+t1dXXU1dW14+klVYsFC3IL78QT4atfLboaSVpo2LBhDBs2bLEe0+oSBxGxHTAkpTSw\ndPtEoD6ldGajfV5nYXDqBcwGfpxSuqPJc7nEgVTjzjwT7r0XHngAlnKVOkkVrD1LHLQVopYGXgZ2\nASYDI4H9U0pN50Q17H8F8PeU0i3N3GeIkmrYM8/A7rvD6NHQt2/R1UhS65Z4naiU0vyIOAq4F+gB\nXJZSGhcRR5Tu/3PZqpVUtT75BA46CM45xwAlqXq4YrmkTnfssTBxYj65cLgIiqRuwBXLJRXuoYdy\neHr2WQOUpOri1E5JnWbGDPjhD+Gyy6Bnz6KrkaTysp0nqVOkBPvuC2uvDRdcUHQ1krR4bOdJKsyl\nl8LLL+dz40lSNXIkSlLZvfAC1NXBo4/CZp8726YkVb72jEQ5J0pSWc2Zk9t4Z55pgJJU3RyJklRW\nRx4JH34I117r0XiSui/nREnqUjffDPfdB2PGGKAkVT9HoiSVxYQJsM02cOed0L9/0dVI0pJxTpSk\nLjF/PhxwABx/vAFKUu0wRElaYkOGwCqrwDHHFF2JJHUd50RJWiIPPQRXXAFPPw1L+WeZpBrijzxJ\nHfbee3DIIfDXv+aVySWpljixXFKH1NfDnnvCllvCGWcUXY0klZcTyyV1mvPOg+nT4be/LboSSSqG\nc6IkLbbHHssrko8YAcssU3Q1klQMR6IkLZZ33smndbnySthoo6KrkaTiGKIktdu8ebDPPnDEEfBv\n/1Z0NZJULCeWS2q3n/8cXn8dbr/d5QwkVTfPnSepbK67Du66C0aPNkBJEjgSJakdxo6FXXaBBx+E\nLbYouhpJ6nwucSBpic2YAXvvnZc0MEBJ0kKORElqUX09DB6cj8I7//yiq5GkruOcKElL5PTT84Ka\nN99cdCWSVHkMUZKadffdcPHFeSL5sssWXY0kVR5DlKTPeeMNOOywPAK17rpFVyNJlcmJ5ZIWMWdO\nnkh+0kmwww5FVyNJlcuJ5ZI+U18P++2X23dXXw3R6pRKSapeTiyXtFhOPhkmT4YHHjBASVJbDFGS\nALj8crjxRhgxApZfvuhqJKny2c6TxIMPwgEHwKOPwqabFl2NJBXPdp6kNo0bB/vvD3/7mwFKkhaH\nR+dJNWzqVPj2t+H3v4e6uqKrkaTuxRAl1ag5c/IpXQ44AA49tOhqJKn7cU6UVIPq63MLLwKuuw6W\n8s8pSVqEc6IkNevUU2HiRHjoIQOUJHWUIUqqMVdcAUOHupSBJC0p23lSDXnoobwi+SOPwGabFV2N\nJFWu9rTzHMiXasRTT+UANXSoAUqSysEQJdWAF17ISxn85S+w885FVyNJ1cEQJVW58eNh993h7LPh\nu98tuhpJqh6GKKmKTZwIu+4Kv/41HHhg0dVIUnUxRElV6t13c4D62c/g8MOLrkaSqo8hSqpCH3wA\nu+2WF9Q89tiiq5Gk6uQSB1KVmTkzj0DtsAOcdVZelVyStHjas8SBIUqqIrNnw6BB8OUvw5/+ZICS\npI4q2zpRETEwIl6KiFcj4oRm7j8wIp6NiLERMTwituho0ZI6Zu5c+MEPoHdvuOgiA5QkdbY2R6Ii\nogfwMrAr8DYwCtg/pTSu0T7/AryYUvowIgYCQ1JK2zV5HkeipE4yf35eSHPBArjxRljaEzpJ0hIp\n10hUf2B8SunNlNI8YCgwuPEOKaUnUkoflm4+CfTuSMGSFt+nn8K++8KsWXk1cgOUJHWN9oSo9YGJ\njW5PKn2tJf8B/GNJipLUPh9/DN/5Tm7d3X47LLdc0RVJUu1oz9+s7e7BRcROwP8Dtu9wRZLaZfr0\nfCqXL385n87FEShJ6lrt+bH7NtCn0e0+5NGoRZQmk18CDEwpTW/uiYYMGfLZ9bq6Ourq6hajVEkN\npkyBPfajb8/2AAALEklEQVSAXXbJyxgs5YpvkrREhg0bxrBhwxbrMe2ZWL40eWL5LsBkYCSfn1je\nF3gIOCilNKKF53FiuVQGEybkdaAOOQROOcWj8CSpM7RnYnmbI1EppfkRcRRwL9ADuCylNC4ijijd\n/2fgNGAN4E+Rf6LPSyn1X9I3IGlRL72UTyZ83HFw9NFFVyNJtc3FNqVu4umn8xyoM8/Mo1CSpM5T\nlpEoScV79NG8kOZf/gLf/W7R1UiSwBMQSxXvrrtygLruOgOUJFUSQ5RUoVKC886DH/0I7rgjTyaX\nJFUO23lSBfr0UzjySHjqKRgxAjbYoOiKJElNORIlVZgpU2CnneCjj2D4cAOUJFUqQ5RUQZ5+Gvr3\nzwtp/u1vsPLKRVckSWqJ7TypQtxwAxx1FFx8Mey9d9HVSJLaYoiSClZfD6edBtdcA/ffD1ttVXRF\nkqT2MERJBZo5Ew4+GKZNg5Ej4QtfKLoiSVJ7OSdKKsgrr8C//iustRY8+KABSpK6G0OU1MVSgksu\nyQHqP/8zr0K+7LJFVyVJWly286QuNG0a/PjH8Prr+VQum29edEWSpI5yJErqIg88AFtuCRtvDE8+\naYCSpO7OkSipk336KZx8MgwdCldcAbvtVnRFkqRyMERJnejFF+GAA2DDDeGZZ6BXr6IrkiSVi+08\nqROkBBddBDvumCeP33qrAUqSqo0jUVKZvfFGDk5Tp+Zz3226adEVSZI6gyNRUpl8+imcfjpssw0M\nGACPP26AkqRq5kiUVAYPPphHn/r1g1GjYKONiq5IktTZDFHSEnjnHTj22Ny2O/98GDy46IokSV3F\ndp7UAQsWwAUXwBZbwAYb5KPwDFCSVFsciZIW08iRcOSRsMoq8MgjLpopSbXKkSipncaPh4MOyiNO\nv/gFPPywAUqSapkhSmrDxIlw+OGw3Xb5aLtXXoGDD4aIoiuTJBXJECW14N1384jTVltBz545PJ16\nam7jSZJkiJKamD4dTjopt+pSghdegDPOgDXXLLoySVIlMURJJbNmwe9+l9d6eu89GDMG/vAHWGed\noiuTJFUij85TzZs4ES68EC67DHbbDZ54IgcpSZJa40iUatbIkbD//rDllvDJJzBiBFx3nQFKktQ+\njkSppsyfD7feCueem1cbP/pouPhiWG21oiuTJHU3hijVhBkz4NJL8yrjffvCccfBXnvB0v4PkCR1\nkL9CVLXq6+HRR+Gqq+C222DQILj5ZvjmN4uuTJJUDSKl1DUvFJG66rVU2159NQenq6+GVVeFQw+F\nAw/0KDtJUvtFBCmlVpdVdiRKVWH6dLjhhhyeXn8dDjggjz5ttVXRlUmSqpUjUeq2Zs+G++6Da6/N\nl3vsAYccki+XWabo6iRJ3Vl7RqIMUepWpk6FO++E22/PJwD+5jdhn31g331hjTWKrk6SVC0MUaoK\nr7ySQ9Ptt8Nzz8Huu8PgwXmiuKdikSR1BkOUuqU5c/Kq4ffdl4PTjBl5OYLvfhd22gmWX77oCiVJ\n1c4QpW5h7lwYNQoeeihvo0bB174Gu+wC3/kObLMNLOXa+pKkLmSIUkVasCCf3LchNA0fnk+1svPO\neaRpwIC8NIEkSUUxRKkiTJqUz1P35JP58qmnoE+fhaHpW9+Cnj2LrlKSpIUMUepyH30Eo0cvGprm\nzoX+/WHbbfPlNtsYmiRJlc0QpU4zfz6MH5+Plmu8vfNOXuCycWjaaCOIVr8NJUmqLIYoLbH58+HN\nN/OpVF58EcaOzWHppZdg3XXzBPDGW79+ntRXktT9GaLULgsWwMSJOSg13SZMyOec69cPNtsMttgi\nh6WvfAVWXrnoyiVJ6hyGKAF5TtLEiTkQvfXWopcTJuSJ3z175qDUsG2ySb7ceGPXZZIk1R5DVJWb\nPz+fBuWdd1reJk6EadNgvfVggw2gb9/PX/btCyuuWPS7kSSpcpQlREXEQOA8oAdwaUrpzGb2OR/4\nN2A2cFhKaUwz+xiiWpFSXql7xowcet57L2/vv9/85dSpeb+ePfPcpKbbeuvly96982WPHkW/Q0mS\nuo/2hKhWpwBHRA/gj8CuwNvAqIi4I6U0rtE+g4AvpZT6RcS2wJ+A7Za4+m5m3rx8eP9HH8HMmQuv\nN94+/BCmT89Bafr0z2+QT6LbsyestVbeevXKl5tuCttvv+jX11679Uncw4YNo3fvui55/8qGDRtG\nXV1d0WXUFD/zrudn3vX8zCtTW8dR9QfGp5TeBIiIocBgYFyjffYC/gqQUnoyIlaPiLVTSu92Qr1d\nas4cOOEEmDULPv44by1dX7Agr7K9yir5svHW+GubbAKrr57DUtNthRXKW7//6bqen3nX8zPven7m\nXc/PvDK1FaLWByY2uj0J2LYd+/QGun2IWnpp+NKXYKWV8pFoK63U8vXllnMtJEmSaklbIaq9k5ia\nxoeqmPy0zDJw9NFFVyFJkipRqxPLI2I7YEhKaWDp9olAfePJ5RFxMTAspTS0dPsl4FtN23kRURXB\nSpIk1YYlmlgOjAb6RcSGwGRgX2D/JvvcARwFDC2FrhnNzYdqqxBJkqTupNUQlVKaHxFHAfeSlzi4\nLKU0LiKOKN3/55TSPyJiUESMBz4GftjpVUuSJBWsyxbblCRJqiZLdeWLRcTPImJcRDwfEZ9btFOd\nIyKOjYj6iFiz6FpqQUT8vvR9/mxE3BIRqxVdUzWKiIER8VJEvBoRJxRdTy2IiD4R8XBEvFD6Oe6h\nN10gInpExJiI+HvRtdSC0lJNN5V+jr9YmqrUrC4LURGxE3lNqS1SSl8Fzuqq165lEdEH2A2YUHQt\nNeQ+4CsppS2BV4ATC66n6jRaCHggsDmwf0RsVmxVNWEe8MuU0lfIiyr/p597l/g58CJVcuR7N/AH\n4B8ppc2ALVh0bcxFdOVI1JHAGSmleQAppfe68LVr2TnA8UUXUUtSSvenlOpLN58kr5um8vpsIeDS\nz5SGhYDViVJKU1JKz5SuzyL/clmv2KqqW0T0BgYBl/L55YRUZqXOwYCU0uWQ54anlD5saf+uDFH9\ngB0jYkREDIuIb3bha9ekiBgMTEopjS26lhr2/4B/FF1EFWpukd/1C6qlJpWO2t6a/IeCOs+5wK+A\n+rZ2VFlsBLwXEVdExNMRcUlErNjSzm0tcbBYIuJ+YJ1m7jq59FprpJS2i4htgL8BG5fz9WtRG5/5\nicDujXfvkqJqQCuf+0kppb+X9jkZmJtSuq5Li6sNtjUKFBErAzcBPy+NSKkTRMSewNSU0piIqCu6\nnhqxNPB14KiU0qiIOA/4L+C0lnYum5TSbi3dFxFHAreU9htVmujcM6U0rZw11JqWPvOI+Co5UT8b\n+Xw0vYGnIqJ/SmlqF5ZYlVr7XgeIiMPIQ/C7dElBtedtoE+j233Io1HqZBGxDHAzcE1K6bai66ly\n/wrsFRGDgOWBVSPiqpTSIQXXVc0mkTs4o0q3byKHqGZ1ZTvvNmBngIjYBFjWANV5UkrPp5TWTilt\nlFLaiPyN8XUDVOeLiIHk4ffBKaVPiq6nSn22EHBELEteCPiOgmuqepH/IrsMeDGldF7R9VS7lNJJ\nKaU+pZ/h+wEPGaA6V0ppCjCxlFMAdgVeaGn/so5EteFy4PKIeA6YC/iN0LVsf3SdC4BlgftLo4BP\npJR+WmxJ1aWlhYALLqsWbA8cBIyNiDGlr52YUrqnwJpqiT/Hu8bPgGtLf6C9RiuLiLvYpiRJUgd0\n6WKbkiRJ1cIQJUmS1AGGKEmSpA4wREmSJHWAIUqSJKkDDFGSJEkdYIiSJEnqAEOUJElSB/x/ho/k\nvL3l4tgAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x59aef28>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import numpy as np\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "f, ax = plt.subplots(figsize=(10, 5))\n",
    "rng = np.linspace(-5, 5)\n",
    "log_f = np.apply_along_axis(lambda x:1 / (1 + np.exp(-x)), 0, rng)\n",
    "ax.set_title(\"Logistic Function between [-5, 5]\")\n",
    "ax.plot(rng, log_f);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们用`make_classification`方法创建一个数据集来进行分类："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from sklearn.datasets import make_classification\n",
    "X, y = make_classification(n_samples=1000, n_features=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to do it..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`LogisticRegression`对象和其他线性模型的用法一样："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "lr = LogisticRegression()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们将把前面200个数据作为训练集，最后200个数据作为测试集。因为这是随机数据集，所以用最后200个数据没问题。但是如果处理具有某种结构的数据，就不能这么做了（例如，你的数据集是时间序列数据）："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X_train = X[:-200]\n",
    "X_test = X[-200:]\n",
    "y_train = y[:-200]\n",
    "y_test = y[-200:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在本书后面的内容里，我们将详细介绍交叉检验。这里，我们需要的只是用逻辑回归拟合模型。我们会关注训练集的预测结果，就像测试集预测结果一样。经常对比两个数据集预测正确率是个好做法。通常，你在训练集获得的结果更好；模型在测试集上预测失败的比例也至关重要："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "lr.fit(X_train, y_train)\n",
    "y_train_predictions = lr.predict(X_train)\n",
    "y_test_predictions = lr.predict(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在我们有了预测值，让我们看看预测的效果。这里，我们只简单看看预测正确的比例；后面，我们会详细的介绍分类模型效果的评估方法。\n",
    "\n",
    "计算很简单，就是用预测正确的数量除以总样本数："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.89375000000000004"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(y_train_predictions == y_train).sum().astype(float) / y_train.shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "测试集的效果是："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.90500000000000003"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(y_test_predictions == y_test).sum().astype(float) / y_test.shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，测试集的正确率和训练集的结果差不多。但是实际中通常差别很大。\n",
    "\n",
    "现在问题变成，怎么把逻辑函数转换成分类方法。\n",
    "\n",
    "首先，线性回归希望找到一个线性方程拟合出给定自变量$X$条件下因变量$Y$的期望值，就是$E(Y|X)=x \\beta$。这里$Y$的值是某个类型发生的概率。因此，我们要解决的分类问题就是$E(p|X)=x \\beta$。然后，只要阈值确定，就会有$Logit(p) = X \\beta$。这个理念的扩展形式可以构成许多形式的回归行为，例如，泊松过程（Poisson）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## There's more..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面的内容你以后肯定会遇到。一种情况是一个类型与其他类型的权重不同；例如，一个能可能权重很大，99%。这种情况在分类工作中经常遇到。经典案例就是信用卡虚假交易检测，大多数交易都不是虚假交易，但是不同类型误判的成本相差很大。\n",
    "\n",
    "让我们建立一个分类问题，类型$y$的不平衡权重95%，我们看看基本的逻辑回归模型如何处理这类问题："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0562"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X, y = make_classification(n_samples=5000, n_features=4, weights=[.95])\n",
    "sum(y) / (len(y)*1.) #检查不平衡的类型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "建立训练集和测试集，然后用逻辑回归拟合："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X_train = X[:-500]\n",
    "X_test = X[-500:]\n",
    "y_train = y[:-500]\n",
    "y_test = y[-500:]\n",
    "\n",
    "lr.fit(X_train, y_train)\n",
    "y_train_predictions = lr.predict(X_train)\n",
    "y_test_predictions = lr.predict(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在我们在看看模型拟合的情况："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.96711111111111114"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(y_train_predictions == y_train).sum().astype(float) / y_train.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.96799999999999997"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(y_test_predictions == y_test).sum().astype(float) / y_test.shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "结果看着还不错，但这是说如果我们把一个交易预测成正常交易（或者称为类型0），那么我们有95%左右的可能猜对。如果我们想看看模型对类型1的预测情况，可能就不是那么好了："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(y_test[y_test==1] == y_test_predictions[y_test==1]).sum().astype(float) / y_test[y_test==1].shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果相比正常交易，我们更关心虚假交易；那么这是由商业规则决定的，我们可能会改变预测正确和预测错误的权重。\n",
    "\n",
    "通常情况下，虚假交易与正常交易的权重与训练集的类型权重的倒数一致。但是，因为我们更关心虚假交易，所有让我们用多重采样（oversample）方法来表示虚假交易与正常交易的权重。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LogisticRegression(C=1.0, class_weight={0: 0.15, 1: 0.85}, dual=False,\n",
       "          fit_intercept=True, intercept_scaling=1, max_iter=100,\n",
       "          multi_class='ovr', penalty='l2', random_state=None,\n",
       "          solver='liblinear', tol=0.0001, verbose=0)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr = LogisticRegression(class_weight={0: .15, 1: .85})\n",
    "lr.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们再预测一下结果："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y_train_predictions = lr.predict(X_train)\n",
    "y_test_predictions = lr.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.70833333333333337"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(y_test[y_test==1] == y_test_predictions[y_test==1]).sum().astype(float) / y_test[y_test==1].shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "但是，这么做需要付出什么代价？让我们看看："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.93999999999999995"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(y_test_predictions == y_test).sum().astype(float) / y_test.shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，准确率降低了3%。这样是否合适由你的问题决定。如果与虚假交易相关的评估成本非常高，那么它就能抵消追踪虚假交易付出的成本。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
